Building RDF Content for Data-to-Text Generation
نویسندگان
چکیده
In Natural Language Generation (NLG), one important limitation is the lack of common benchmarks on which to train, evaluate and compare data-to-text generators. In this paper, we make one step in that direction and introduce a method for automatically creating an arbitrary large repertoire of data units that could serve as input for generation. Using both automated metrics and a human evaluation, we show that the data units produced by our method are both diverse and coherent.
منابع مشابه
Domain-Adaptable Hybrid Generation of RDF Entity Descriptions
RDF ontologies provide structured data on entities in many domains and continue to grow in size and diversity. While they can be useful as a starting point for generating descriptions of entities, they often miss important information about an entity that cannot be captured as simple relations. In addition, generic approaches to generation from RDF cannot capture the unique style and content of...
متن کاملThe WebNLG Challenge: Generating Text from RDF Data
The WebNLG challenge consists in mapping sets of RDF triples to text. It provides a common benchmark on which to train, evaluate and compare “microplanners”, i.e. generation systems that verbalise a given content by making a range of complex interacting choices including referring expression generation, aggregation, lexicalisation, surface realisation and sentence segmentation. In this paper, w...
متن کاملApplication of Chinese Natural Language Generation in Semantic Web
RDF is the representation of the Semantic Web. When querying RDF documents, the result is a sub-graph of RDF data model or a number of triple statements. In this paper, we apply natural language generation technique to render the result into multi-sentential text for human comprehension. We investigate the effect of discourse segmentation on the generation of anaphora and punctuation marks in C...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملRepresenting Text Mining Results for Structured Pharmacological Queries
Several approaches integrating life science data using Semantic Web technologies have been described in the literature. However, these approaches have largely ignored the vast amount of content only available within the scientific literature. In this article, we present an RDF schema for text mining results that enables queries in SPARQL over textual and database data together. We show how real...
متن کامل